(19) 



Europaisches Patentamt 
European Patent Office 
Office europeen des brevets 



1 



(12) 



(43) Date of publication: 

11.12.2002 Bulletin 2002/50 

(21) Application number: 02253371.5 

(22) Date of filing: 14.05.2002 



(H) EP 1 265 161 A2 

EUROPEAN PATENT APPLICATION 

(51) IntCI 7 : G06F 17/30 



(84) 


Designated Contracting States: 


(72) 


Inventors: 




AT BE CH CY DE DK ES Fl FR GB GR IE IT LI LU 


• 


Baskins, Douglas L. 




MC NL PT SE TR 




Fort Collins, Colorado 80524 (US) 




Designated Extension States: 


• 


Silverstein, Alan 




AL LT LV MK RO SI 




Fort Collins, Colorado 80525 (US) 


(30) 


Priority: 04.06.2001 US 874788 


(74) 


Representative: Jehan, Robert et al 








Williams Powell 


(71) 


Applicant: Hewlett-Packard Company 




4 St Paul's Churchyard 




Palo Alto, CA 94304 (US) 




London EC4M 8AY (GB) 



(54) System for and method of storing data 

(57) An adaptive digital tree data structure incorpo- 
rates a rich pointer object (104, 105, 110-112, 114-118), 
the rich pointer including both conventional address re- 
direction information (1 16B) used to traverse the struc- 
ture and supplementary information (116A) used to op- 



timize tree traversal, skip levels, detect errors, and store 
state information. The structure of the pointer is flexible 
so that, instead of storing pointer information, data may 
be stored in the structure of the pointer itself and thereby 
referenced without requiring further redirection. 
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Description 

[0001] The present invention relates generally to the 
field of data structures, preferably to a hierarchical data 
organization in which the structure of the data organiza- 
tion is dependent on the data stored and information is 
associated with pointers. 

[0002] The present invention is related to co-pending, 
European Patent Application No.(RJ/N12198), (RJ/ 
N12199), and (RJ/N12336), filed the same day as this 
application. 

[0003] Computer processors and associated memory 
components continue to increase in speed. As hardware 
approaches physical speed limitations, however, other 
methods for generating appreciable decreases in data 
access times are required. Even when such limitations 
are not a factor, maximizing software efficiency maxi- 
mizes the efficiency of the hardware platform, extending 
the capabilities of the hardware/software system as a 
whole. One method of increasing system efficiency is 
by providing effective data management, achieved by 
the appropriate choice of data structure and related stor- 
age and retrieval algorithms. For example, various prior 
art data structures and related storage and retrieval al- 
gorithms have been developed for data management in- 
cluding arrays, hashing, binary trees, AVL trees (height- 
balanced binary trees), b-trees, and skiplists. In each of 
these prior art data structures, and related storage and 
retrieval algorithms, an inherent trade-off has existed 
between providing faster access times and providing 
lower memory overhead. For example, an array allows 
for fast indexing through the calculation of the address 
of a single array element but requires the pre-allocation 
of the entire array in memory before a single value is 
stored, and unused intervals of the array waste memory 
resources. Alternatively, binary trees, AVL trees, b-trees 
and skiplists do not require the pre-allocation of memory 
for the data structure and attempt to minimize allocation 
of unused memory but exhibit an access time which in- 
creases as the population increases. 
[0004] An array is a prior art data structure that has a 
simplified structure and allows for rapid access of the 
stored data. However, memory must be allocated for the 
entire array and the structure is inflexible. An array value 
Is looked up "position ally," or "digitally," by multiplying 
the Index by the size (e.g., number of bytes) allocated 
to each element of the array and adding the offset of the 
base address of the array. Typically, a single Central 
Processing Unit (CPU) cache line fill is required to ac- 
cess the array element and value stored therein. As de- 
scribed and typically implemented, the array is memory 
inefficient and relatively inflexible. Access, however, is 
provided as 0(1), i.e., independent of the size of the ar- 
ray (ignoring disk swapping). 

[0005] Alternatively, other data structures previously 
mentioned including binary trees, b-trees, skiplists and 
hash tables, are available which are more memory effi- 
cient but include undesirable features. For example, 



hashing is used to convert sparse, possibly multi-word 
indexes (such as strings) into array indexes. The typical 
hash table is a fixed-size array, and each index into it is 
the result of a hashing algorithm performed on the orig- 
5 inal index. However, in order for hashing to be efficient, 
the hash algorithm must be matched to the indexes 
which are to be stored. Hash tables also require every 
data node to contain a copy of (or a pointer to) the orig- 
inal index (key) so you can distinguish nodes in each 
synonym chain (or other type of list). Like an array, use 
of hashing requires some preallocation of memory, but 
it is normally a fraction of the memory that must be al- 
located for a flat array, if well designed i.e., the charac- 
teristics of the data to be stored are well known, behaved 
and matched to the hashing algorithm, collision resolu- 
tion technique and storage structure implemented. 
[0006] In particular, digital trees, ortries, provide rapid 
access to data, but are generally memory inefficient. 
Memory efficiency may be enhanced for handling 
sparse index sets by keeping tree branches narrow, re- 
sulting in a deeper tree and an increase in the average 
number of memory references, indirections, and cache 
line fills, all resulting in slower access to data. This latter 
factor, i.e., maximizing cache efficiency, is often ignored 
when such structures are discussed yet may be a dom- 
inant factor affecting system performance. A trie is a tree 
of smaller arrays, or branches, where each branch de- 
codes one or more bits of the index. Prior art digital trees 
have branch nodes that are arrays of simple pointers or 
addresses. Typically, the size of the pointers or address- 
es are minimized to improve the memory efficiency of 
the digital tree. 

[0007] At the "bottom" of the digital tree, the last 
branch decodes the last bits of the index, and the ele- 
ment points to some storage specific to the index. The 
"leaves" of the tree are these memory chunks for spe- 
cific indexes, which have application-specific structures. 
[0008] Digital trees have many advantages including 
not requiring memory to be allocated to branches which 
have no indexes or zero population (also called an emp- 
ty subexpanse). In this case the pointer which points to 
the empty subexpanse is given a unique value and is 
called a null pointer indicating that it does not represent 
a valid address value. Additionally, the indexes which 
are stored in a digital tree are accessible in sorted order 
which allows identification of neighbors. An "expanse" 
of a digital tree as used herein is the range of values 
which could be stored within the digital tree, while the 
population of the digital tree is the set of values that are 
actually stored within the tree. Similarly, the expanse of 
a branch of a digital tree is the range of indexes which 
could be stored within the branch, and the population of 
a branch is the number of values (e.g., count) which are 
actually stored within the branch. (As used herein, the 
term "population" refers to either the set of indexes or 
the count of those indexes, the meaning of the term be- 
ing apparent to those skilled in the art from the context 
in which the term is used.) 
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[0009] "Adaptive Algorithms for Cache- Efficient Trie 
Search" by Acharya, Zhu and Shen (1999), the disclo- 
sure of which is hereby incorporated herein by refer- 
ence, describes cache-efficient algorithms for trie 
search. Each of the algorithms use different data struc- s 
tures, including a partitioned-array, B-tree, hashtable, 
and vectors, to represent different nodes in a trie. The 
data structure selected depends on cache characteris- 
tics as well as the fanout of the node. The algorithms 
further adapt to changes in the fanout at a node by dy- 10 
namically switching the data structure used to represent 
the node. Finally, the size and the layout of individual 
data structures is determined based on the size of the 
symbols in the alphabet as well as characteristics of the 
cache(s). The publication further includes an evaluation 15 
of the performance of the algorithms on real and simu- 
lated memory hierarchies. 

[0010] Other publications known and available to 
those skilled in the art describing data structures include 
Fundamentals of Data Structures in Pascal, 4th Edition; 20 
Horowitz and Sahni; pp 582-594; The Art of Computer 
Programming, Volume 3; Knuth; pp 490-492; Algorithms 
in C; Sedgewick; pp 245-256, 265-271 ; "Fast Algorithms 
for Sorting and Searching Strings"; Bentley, Sedgewick; 
"Ternary Search Trees"; 5871926, INSPEC Abstract 25 
Number: C9805-61 20-003; Dr Dobb's Journal; "Algo- 
rithms for Trie Compaction", ACM Transactions on Da- 
tabase Systems, 9(2):243-63, 1984; "Routing on long- 
est-matching prefixes"; 5217324, INSPEC Abstract 
Number: B9605-6150M-005, C9605-5640-006; "Some 30 
results on tries with adaptive branching"; 6845525, IN- 
SPEC Abstract Number: C2001 -03-6120-024; "Fixed- 
bucket binary storage trees"; 01998027, INSPEC Ab- 
stract Number: C83009879; "DISCS and other related 
data structures"; 03730613, INSPEC Abstract Number: 35 
C90064501 ; and "Dynamical sources in information the- 
ory: a general analysis of trie structures"; 6841374, IN- 
SPEC Abstract Number: B2001 -03-61 10-01 4, 
C2001 -03-6120-023, the disclosures of which are here- 
by incorporated herein by reference. 40 
[001 1] An enhanced storage structure is described in 
U.S. Patent Application Serial No. 09/457,164 filed De- 
cember 8, 1999, entitled "A FAST EFFICIENT ADAP- 
TIVE, HYBRID TREE," (the '164 application) assigned 
in common with the instant application and hereby in- 45 
corporated herein by reference in Its entirety. The data 
structure and storage methods described therein pro- 
vide a self-adapting structure which self-tunes and con- 
figures "expanse" based storage nodes to minimize 
storage requirements and provide efficient, scalable da- so 
ta storage, search and retrieval capabilities. The struc- 
ture described therein, however, does not take full ad- 
vantage of certain sparse data situations. 
[0012] An enhancement to the storage structure de- 
scribed in the '1 64 application is detailed in U.S. Patent ss 
Application Serial No. 09/725,373, filed November 29, 
2000, entitled "A DATA STRUCTURE AND STORAGE 
AND RETRIEVAL METHOD SUPPORTING ORDINAL- 



ITY BASED SEARCHING AND DATA RETRIEVAL", as- 
signed in common with the instant application and here- 
by incorporated herein by reference in its entirety. This 
latter application describes a data structure and related 
data storage and retrieval method which rapidly pro- 
vides a count of elements stored or referenced by a hi- 
erarchical structure of ordered elements (e.g., a tree), 
access to elements based on their ordinal value in the 
structure, and identification of the ordinality of elements, 
in an ordered tree implementation of the structure, a 
count of indexes present in each subtree is stored, i.e., 
the cardinality of each subtree is stored either at or as- 
sociated with a higher level node pointing to that subtree 
or at or associated with the head node of the subtree. 
In addition to data structure specific requirements (e.g., 
creation of a new node, reassignment of pointers, bal- 
ancing, etc.) data insertion and deletion includes steps 
of updating affected counts. Again, however, the struc- 
ture fails to take full advantage of certain sparse data 
situations. 

[0013] The present invention seeks to provide im- 
proved data processing, in the preferred embodiments 
to optimize performance characteristics of a digital tree 
and similar structures. 

[0014] According to an aspect of the present invention 
there is provided a data structure as specified in claim 1 . 
[0015] According to another aspect of the present in- 
vention there is provided a method of storing indexes in 
a data structure as specified in claim 6. 
[0016] According to another aspect of the present in- 
vention there is provided a computer memory as spec- 
ified in claim 10. 

[0017] The preferred system includes a data structure 
which is stored in the memory, can be treated as a dy- 
namic array, and is accessed through a root pointer. For 
an empty tree, this root pointer is null, otherwise it points 
to the first of a hierarchy of branch nodes. Each branch 
node consists of a plurality of informational or "rich" 
pointers which subdivide the expanse of the index (key) 
used to access the data structure. Each rich pointercon- 
tains auxiliary information in addition to, or in some cas- 
es instead of, the address of (that is, the pointer to) a 
subsidiary (child) branch or leaf node. This auxiliary in- 
formation permits various optimizations that result in a 
positive "return on investment" despite the space re- 
quired to store the information. 

[0018] An informational pointer may contain an ad- 
dress (the actual pointer to a child branch or leaf node); 
index digits (parts of keys) that help skip levels in the 
tree or bring leaf information to the present level; popu- 
lation counts that help rapidly count the numbers of valid 
(stored) indexes in the tree or in any subexpanse (range 
of indexes); and type information about the next level in 
the tree, if any, to which the pointer points. Pointers may 
also provide information for verifying operation and data 
integrity, and correcting errors. State information may 
also be bundled with pointers so that the resultant rich 
pointers provide state information. In this case, the data 



3 



5 



EP 1 265 161 A2 



6 



structure not only provides a means to store and manip- 
ulate data, but includes facilities supporting the process- 
es using the structure. The inclusion of this information 
allows the digital tree to be compressed in various ways 
that make it smaller, more cache-efficient, and faster to 
access and modify, even as the branch nodes are po- 
tentially no longer simply arrays of pointers to subsidiary 
nodes. This information also provides structure and re- 
dundancies that allow for faster access to and modifica- 
tion of the tree, as well as detection of data corruption. 
[0019] Embodiments of the present invention are de- 
scribed below, by way of example only, with reference 
to the accompanying drawings, in which: 

FIGURES 1A - 1E depict a digital tree which in- 
cludes a comparison between prior art pointers and 
an informational pointer for skipping levels in a data 
structure; 

FIGURE 2 is generalized diagram of an information- 
al pointer incorporating immediate storage of index- 
es; 

FIGURE 3 is a chart showing typical storage capa- 
bilities of informational pointers used to store imme- 
diate indexes; 

FIGURES 4A-4D are diagrams of rich pointers used 
to store 3, 2 and 1 byte immediate indexes on a 
32-bit system; 

FIGURES 5A-5H are diagrams of rich pointers used 
to store 7-1 byte immediate indexes on a 64-bit sys- 
tem; 

FIGURES 6A - 6D are diagrams of rich pointers 
used to store immediate indexes and associated 
values on a 64-bit system; 

FIGURES 7A - 7E depict a digital tree which in- 
cludes a comparison between indexes stored in leaf 
nodes and informational pointers used as immedi- 
ate indexes; and 

FIGURE 8 is a block diagram of a computer system 
in which the data structure may be implemented. 

[0020] As previously described, typical digital trees 
exhibit several disadvantages. These disadvantages in- 
clude memory allocated to null pointers associated with 
empty branches while exhibiting an increased number 
of memory references or indirections, and possibly 
cache line fills, as the size {i.e., "fanout") of the branches 
narrows to reduce the number of these null pointers. 
These disadvantages associated with digital trees have 
limited their use in prior computer applications. 
[0021] The described embodiment combines the ad- 
vantages of the digital tree with smarter approaches to 
handling both non-terminal nodes (branches) and ter- 
minal nodes (leaves) in the tree. These smarter ap- 
proaches minimize both memory space and processing 
time, for both lookups, insertions and modifications of 
data stored in the data structure. Additionally, it ensures 
the data structure remains efficient as indexes are add- 
ed or deleted from the data structure. The approaches 



used by this embodiment include forms of data com- 
pression and compaction and help reduce the memory 
required for the data structure, minimize the number of 
cache line fills required, and reduce access and retrieval 
5 times. 

[0022] The described system replaces the simple 
pointers typically implemented in digital trees with "rich" 
pointers (herein termed "informational pointers" and 
used interchangeably therewith) which associate addi- 

10 tional information with the redirection or address infor- 
mation of the pointers. This additional information may 
be used by the data structure and/or by processes ac- 
cessing the structure. The use of rich pointers within the 
digital tree permits various optimizations within the data 

15 structure. In a preferred embodiment of the invention 
each rich pointer in a digital tree branch includes multi- 
ple segments or portions, typically occupying two words 
(dependent upon the target platform). The rich pointer 
may contain an address (the actual pointer), index digits 

20 (parts of keys), population counts, type information con- 
cerning the next level to which the pointer "points" or is 
directed to within the tree, redundant data supporting 
error detection, state information, etc. 
[0023] One type of a rich pointer is a narrow-expanse 

25 pointer. In particular, one type of data compression that 
may be used when an expanse is populated by a "dense 
cluster" of indexes that all have some leading bits in 
common is supported by a narrow-expanse pointer, to 
the present invention. The typical representation of the 

30 common bits through multiple digital tree branches (or 
redundant bits in leaves) can be replaced by uniquely 
representing (e.g., encoding) the common bits as part 
of or associated with the pointer to the branch or leaf. 
In a preferred embodiment of the present invention, this 

35 type of data compression is limited to common leading 
whole bytes. The common bits are stored in a rich point- 
er and the pointer type indicates the level of and the 
number of remaining undecoded digits in the next ob- 
ject. The remaining undecoded digits imply the number 

40 of levels skipped by the narrow pointer. The rich pointer 
is stored (/.a, associated) with the pointer to the next 
level, which has an expanse smaller than it would oth- 
erwise. Preferably, each subexpanse pointer contains a 
"decode" field that holds all index bytes decoded so far 

45 except for the first byte. Narrow pointers provide a meth- 
od to skip levels within a digital tree which save memory 
used to store the indexes and reduces the number of 
memory references and cache fills required. 
[0024] FIGURES 1 A - 1 E depict the use of narrow- 

50 expanse pointers in a digital tree. (For the purposes of 
the present illustration, examples of the data structure 
are given with reference to a 32-bit word size platform 
wherein indexes are single words (as opposed, e.g., to 
character strings of arbitrary length) although it is un- 

55 derstood that the system is not so limited and, to the 
contrary, encompasses other word sizes and configura- 
tions including, but not limited to 16, 32, 64 and 128-bit 
word sizes.) As used herein, the term "slot" refers to a 
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record or group of cells of an array associated with, and/ 
or including a pointer to a child node or, more generally, 
a subexpanse of indexes, together with any data asso- 
ciated with the pointer. Generally, the array is "indexed" 
so that each cell or "slot" is associated with an offset 
value corresponding to an ordinal value of the slot within 
the array. Thus, in further detail, root pointer node 101 
is used for accessing the underlying data structure of 
the digital tree. Root pointer node 101 includes address 
information 102 diagrammatically shown as an arrow 
pointing to a first or "top" level node 1 03, in this illustra- 
tion, a branch node. (Note, the terminology used herein 
labels the top node of a tree pointed to by the root as 
"level 1", children of the level 1 node are designated as 
"level 2" nodes, etc. According to this convention, the 
level of any branch or leaf node is equal to one more 
than the number of digits (bytes) decoded in the indexes 
stored above that node. It is further noted that this con- 
vention, while representative, is for purposes of the 
present explanation and other conventions may be 
adopted including, for example, designating leaf nodes 
as constituting a first level of the tree. In this latter case, 
a preferred embodiment of the invention , the level of any 
branch or leaf node is equal to the number of digits 
(bytes) remaining to decode in the indexes stored at or 
below that node.) First level node 103 includes slots or 
enhanced pointer arrays for up to 256 lower level nodes 
and represents the entire expanse of the data structure, 
i.e. indexes 00000000 through FFFFFFFF hex by im- 
plementing a 256-way branch. (Note that, although a 
preferred embodiment decodes 1 byte of the index at 
each branch, other divisions of the index may be used 
including, for example, decoding 4 bits to implement a 
1 6-way branch at each level of the tree, etc.) First level 
node 1 03 includes first slot 1 04 (containing an adaptable 
object) which corresponds to expanse 
00000000-OOFFFFFF and last slot 105 which corre- 
sponds to a final expanse portion including indexes 
FFOO0O00-FFFFFFFF. The pointer contained in the 
pointer field in slot 1 04 points to a first one of 256 of the 
next level subexpanses (level 2 in the digital tree) while 
the pointer in slot 105 points to the most significant up- 
per 1 /256th of level 2. 

[0025] The first subexpanses of level 2 includes sub- 
sidiary node 108 in turn including an array of 256 point- 
ers directed to lower level nodes 119 and 120, As 
shown, the expanse covered by node 1 08 (i.e., an index 
range of 00000000-OOFFFFFF hex) is only sparsely 
populated by indexes falling within the subexpanse 
ranges covered by third level nodes 119 and 120 [i.e., 
00000000-OOOOFFFF and 00100000-001 0FFFF hex, 
respectively). Thus, while the pointers in slots 110 and 
112 include valid redirection information to (/*©., address 
of) nodes 119 and 120, the remaining 254 pointers of 
node 108, including the pointer in slot 111 covering an 
uppermost expanse range of OOFF000O-0OFFFFFF 
hex, are null pointers, i.e., have a special value reserved 
for pointers that are not directed to any target location 



or to empty nodes. Note that node 120 is similarly 
sparsely populated, with all indexes falling within a sin- 
gle subexpanse node 121 associated with a range of 
001 00200-001 002FF hex and pointed to by the sole ac- 
5 tive pointer in node 120, that is pointer 122. Thus, not 
only does node 120 require the allocation of additional 
storage space for 256 pointers, but access to indexes 
referenced by it to leaf nodes requires two indirections 
and therefore two cache fills. 

w [0026] Thus, as pictured, slot 110 contains a pointer 
to a level 3 slot which corresponds to 
000000-0000FFFF. Additionally, slot 112 contains a 
pointer which points to a separate subexpanse 120 of 
level 3 which correlates to 001 00000-001 0FFFF. Simi- 

15 iarly : slots within level 3 may further point to a subex- 
panse at level 4. Operationally, level 4 of FIGURES 1A 
- 1 E Is reached by consecutive decoding of one-byte 
portions of the index and traversing the tree in accord- 
ance with the decoded values. The first one byte (00) is 

20 used to identify slot 1 04 which contains the correspond- 
ing pointer to traverse the tree from level 1 to the corre- 
sponding portion of level 2 i.e., the node addressed by 
the pointer of slot 1 04. The next byte (1 0) is used to iden- 
tify slot 1 1 2 which contains the corresponding pointer to 

25 traverse the tree from node 1 08 to subsidiary node 1 20 
at level 3. The next byte (02) is used to identify slot 122 
which contains the corresponding pointerto traverse the 
tree from node 120 of level 3 to node 121 of level 4. 
Once at level 4, the remaining byte is used to access 

30 the appropriate slot of node 121 to retrieve the data as- 
sociated with the index value. As described, this process 
requires four separate memory references and poten- 
tially four different cache fills to identify the correct mem- 
ory address which corresponds to the index. 

35 [0027] If an expanse, or subexpanse, is sparsely pop- 
ulated with a small number of dense clusters of subsid- 
iary indexes, a rich pointer may be used to encode the 
common bits of the populated subexpanse or indexes. 
Still referring to FIGURES 1 A - 1 E, the upper 1 /256 sub- 

40 expanses of level 2 subsidiary node 109 contains a 
dense cluster of indexes which each lie within the range 
of FF1 00200-FF1 002FF. The other portions of the upper 
1/256 subexpanse, FF1 00000-FF1 001 FF and 
FF100300-FF10FFFF do not contain indexes. In this 

45 case, a rich pointer can be used to point directly to the 
level 4 portion of the subexpanse, skipping level 3 and 
eliminating the need for a memory reference or indirec- 
tion to level 3. Specifically, the corresponding slot 116 
contains a rich pointer node which includes an informa- 

50 tion data field 1 1 6A and a pointer node 1 1 6B to the next 
subexpanse or other structure for accessing the subsid- 
iary indexes. The information data field 116A includes 
the common bytes (i.e., index portion) of the remaining 
indexes, 02, because the remaining indexes all fall with- 

55 in the range of FF1 00200-FF1 002FF. 

[0028] In this case, the rich pointer is used to eliminate 
one of the memory references and possibly one cache 
fill. The first two bytes (FF) of the index are used to 
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traverse from the level 1 of the tree to the appropriate 
portion of level 2. Once at the level 2 node the rich point- 
er is used to traverse from the level 2 node directly to 
the level 4 node. 

[0029] The rich pointer structure encompasses at 
least two types of rich pointers or adaptable objects in- 
cluding a pointer type as described above and an im- 
mediate type. The immediate type supports immediate 
leaves or immediate indexes. That is, when the popula- 
tion of an expanse is relatively sparse, a rich pointer is 
used to store the indexes "immediately" within a digital 
tree branch, rather requiring traversal of the digital tree 
down to the lowest level to access the index. This format 
is akin to the "immediate" machine instruction format 
wherein an instruction specifies an immediate operand 
which immediately follows any displacement bytes. 
Thus, an immediate index or a small number of indexes 
are stored in the node, avoiding one or more redirections 
otherwise required to traverse the tree and arrive at 
some distant leaf node. Immediate indexes thereby pro- 
vide a way of packing small populations (or small 
number of indexes) directly into a rich pointer structure 
instead of allocating more memory and requiring multi- 
ple memory references and possible cache fills to ac- 
cess the data. 

[0030] A two word format of the preferred embodi- 
ment readily supports the inclusion of immediate index- 
es. Within the rich pointer, this is accomplished by stor- 
ing index digits in the information data field. A rich point- 
er implemented in a 32-bit system may store anywhere 
from a single 3-byte immediate index up to seven 1 -byte 
indexes, while a rich pointer in a 64-bit system may store 
up to 15 1-byte immediate indexes; The generalized 
structure of a rich pointer (also referred to as an adapt- 
able object) supporting immediate indexes is shown in 
FIGURE 2. The rich pointer includes one or more index- 
es "I", depending on the word-size of the platform and 
the size of the index, and an 8-bit Type field that also 
encodes the index size and the number of immediate 
indexes. 

[0031 ] As mentioned, the number of immediate index- 
es stored will depend upon the word-size of the indexes, 
upper levels within the tree nearest the root requiring 
larger indexes, smaller indexes being found as the tree 
is traversed toward the leaves. Examples of numbers of 
immediate index values of various sizes accommodated 
by 32-bit and 64-bit machines according to a preferred 
embodiment are presented in FIGURE 3 wherein index- 
es are mapped to valid/invalid indicators and have no 
associated values. FIGURES 4A-4D illustrate 3, 2 and 
1-byte index sizes stored in an immediate rich pointer 
structure implemented on a 32-bit platform, while FIG- 
URES 5A-5H illustrate index sizes of 7 through 1 byte 
implemented on a 64-bit machine. The structures of 
FIGURES 4A-4D and 5A-5H are also directed to an em- 
bodiment of the invention in which only the presence or 
absence of an index is indicated without any other value 
being associated with the indexes. 



[0032] FIGURES 6A - 6D illustrate another embodi- 
ment of the invention on a 64-bit machine wherein a val- 
ue is associated with each index l n . According to this 
embodiment, when a single immediate index I, of up to 
5 7 bytes is stored in a rich pointer structure, a 64-bit value 
associated with the index is also stored as shown in FIG- 
URE 6A. However, if more than one immediate index is 
to be stored, such as when an index may be represented 
by 3-bytes, 2-bytes or 1-byte indexes (FIGURES 6B - 
to 6C, respectively), then the first 8-byte word of the rich 
pointer is instead used as a pointer to values associated 
with the respective multiple indexes. A similar configu- 
ration is used to store values associated with indexes 
when the invention is implemented on a 32-bit machine. 
is [0033] Immediate indexes are packed into rich point- 
ers starting at the "first byte" (farthest from the type 
field), and possibly leaving some unused storage. An 
exception is present in a preferred embodiment where- 
in, if a single immediate index is stored, the indexes be- 
gin at the first byte of the second word to allow the first 
word to be a value area corresponding to the index, for 
those arrays that map an index to a value (see FIG- 
URES 4A, 5A and 6A). The structure of an ordinary leaf 
and the indexes portion of a rich pointer containing an 
immediate index are identical once the starting address, 
index size : and population are known. 
[0034] Thus, as described, an immediate index rich 
pointer structure may be thought of as including a small 
leaf. Such a structure is particularly helpful to represent 
a sparsely populated expanse where the indexes reside 
in the rich pointer itself. 

[0035] FIGURES 7A - 7E illustrate a comparison be- 
tween typical pointers and a rich pointer which can be 
used to store immediate indexes. The indexes would 
typically be stored in the portion of level 4 node 121 in 
the corresponding array cell or "slot." By using a rich 
pointer as an immediate index, the indexes that would 
otherwise reside in a leaf node such as leaf node 701 
are instead stored in the corresponding portion of a high- 
er level node, e.g., level 2 node 109. For a 64 bit system, 
one or more indexes can be stored in slot 1 1 6 of level 2 
node 109 in immediate index data field 702. As dia- 
gramed, slot 116 is logically divided into multiple sub- 
slots, each storing an immediate index. The use of rich 
pointers as immediate indexes avoids at least one mem- 
ory reference and one or more cache line fills. 
[0036] Another use of informational fields available 
with rich pointers is directed to storing state information 
associated with the object referenced by the pointer or 
otherwise describing and/or storing state information 
such as the state of the procedure accessing the struc- 
ture. Thus, while the tree itself is not a state machine, 
when combined with a specified index to insert, delete, 
or retrieve, it may be used as input to the accessing 
process that allows the code to operate similar to a state 
machine. Each tree subexpanse pointer includes a 
"type" field (e.g., 8-bits) that encodes one of a large 
number (e.g., 256) of enumerated object types. Levels 
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in the tree are directly encoded in the enumerations. 
Rich pointers allow rich pointer types which are a large 
number of very specific next-level object types. The tree 
traversal code can be thought of as a state machine 
whose inputs are the index/key to decode and the nodes 
of the tree. In a preferred embodiment, the state ma- 
chine algorithm provides software which appears to be 
a single large switch, allowing the software to act as a 
collection of small, fast code chunks, each pre-opti- 
mized to perform one task well with minimum run-time 
computations. Since the tree level is encoded in each 
pointer type, the traversal code does not need to track 
the current tree level. Instead, this state information is 
stored within the nodes of the tree itself. 
[0037] Rich pointers may also be used to detect errors 
in data, interpret and process data, etc., by providing 
information redundancy. That is, data characterizing the 
information stored as part of a rich pointer may be used 
to detect errors in the referenced data much as an error 
detection code might be used. This information may also 
be used to confirm, for example, position in a tree by 
encoding level or similar information in association with 
each pointer to form a rich pointer. 
[0038] In particular, in practice it is not feasible or de- 
sirable to compress all unused bits out of a rich pointer. 
Machine instruction efficiency is partially dependent on 
word and byte boundaries. While the ratio of cache fill 
time to CPU instruction time is sufficiently high that 
cache efficiency is important, it is generally still low com- 
pared to other data compression methods that, for ex- 
ample, are directed to minimizing disk reads. (Cache- 
efficient programs must balance CPU time against 
"complete" data compression.) The result of "incom- 
plete" compression is to provide and use some redun- 
dant data in rich pointers that allows tree traversal code 
to opportunistically, but very "cheaply," detect and report 
many types of data corruption in the tree itself, resulting 
either from tree management code defects or external 
accidents. In the preferred embodiment, cheaply detect- 
ed corruptions may result in a void pointer return, while 
"expensive" detections result in assertion failure in de- 
bug code only and are ignored in production code. The 
unused bits of a rich pointer may be used to opportun- 
istically determine various types of data corruption. As 
used herein, error detection data refers to any redun- 
dant data available to detect data corruption, whether 
or not that data is stored solely for the purpose of error 
detection or stored for other (functional) purposes but 
having a secondary use to detect errors. 
[0039] For example, an error condition may be iden- 
tified by checking to see that a pointer type matches the 
tree level. For example, it is inappropriate for certain ob- 
jects such as a "Leafl" object to appear at other than the 
lowest level of the tree furthest from the root. With ref- 
erence to FIGURES 7A - 7E, if the type field 703 con- 
tains an invalid value, such as 255, an invalid rich pointer 
type would be indicated and appropriate error process- 
ing performed. 



[0040] Another check is performed for decode bytes 
in subexpanse pointers which include already-decoded 
index bytes that are not required as part of a narrow 
pointer, but nonetheless must match the path taken to 
5 this point in the tree. It is more efficient and simpler to 
store already-decoded index bytes this way then to 
somehow optimize to storing only required narrow- 
pointer bytes. 

[0041] Rich pointers also allow computational effi- 
10 ciencies. In particular, when a single immediate index is 
stored in a rich pointer, there is room (e.g., in the Decode 
field) to store all but the first byte of the index, not just 
the remaining undecoded bytes. This allows faster 
traversal and modification. Like decode bytes, these re- 
15 dundant bytes must agree with the path traversed to the 
immediate index. 

[0042] Rich pointers also support pointer portability. 
That is, when a narrow-expanse pointer indicates only 
the level of the subsidiary node, ratherthan the number 

20 of levels being skipped, it remains "portable". Like any 
rich pointer that "knows" about the object to which it re- 
fers but not about the object in which it resides, a port- 
able narrow-expanse pointer allows easier branch in- 
sertion and deletion when an "outlier" index is inserted 

25 or deleted. (An outlier is an index that belongs under the 
full subexpanse of theslot occupied by the narrow-ex- 
panse pointer, but not under the present narrow ex- 
panse of that pointer.) 

[0043] FIGURE 8 is a diagram of a computer system 

30 capable of supporting and running a memory storage 
program implementing and maintaining a data structure 
as taught herein. Thus, although the structure is adapt- 
able to a wide range of data structures, programming 
languages, operating systems and hardware platforms 

35 and systems, FIGURE 8 illustrates one such computer 
system 800 comprising a platform suitable to support 
the structure. Computer system 800 includes Central 
Processing Unit (CPU) 801 coupled to system bus 802. 
CPU 801 may be any general purpose CPU, such as an 

40 HP PA-8500 or Intel Pentium processor. However, the 
system is not restricted by the architecture of CPU 801 
as long as CPU 801 supports the operations as de- 
scribed herein, e.g., the use of pointers. System bus 802 
is coupled to Random Access Memory (RAM) 803, 

45 which may be SRAM, DRAM or SDRAM. ROM 804 is 
also coupled to system bus 802, which may be PROM, 
EPROM, or EEPROM. RAM 803 and ROM 804 hold us- 
er and system data and programs as is well known in 
the art. 

so [0044] System bus 802 is also coupled to input/output 
(I/O) controller card 805, communications adapter card 
811 , user interface card 808, and display card 809. The 
I/O card 805 connects to storage devices 806, such as 
one or more of a hard drive, a CD drive, a floppy disk 

55 drive, a tape drive, to the computer system. Communi- 
cations card 811 is adapted to couple computer system 
800 to network 81 2, which may be one or more of a tel- 
ephone network, a Local (LAN) and/or a Wide-Area 
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(WAN) network, an Ethernet network, and/or the Inter- 
net network and can be wire line or wireless. User inter- 
face card 808 couples user input devices ; such as key- 
board 813 and pointing device 807, to computer system 
800. Display card 809 is driven by CPU 801 to control 
display device 810. 

[0045] The disclosures in United States patent appli- 
cation No.09/874,788, from which this application 
claims priority and in the abstract accompanying this 
application are incorporated herein by reference. 



Claims 

1 . A data structure for storage of indexes in a compu- 
ter memory, including: 

a hierarchy of branch nodes (103, 108, 109, 
119, 120, 121, 123) ordered into a plurality of 
levels beginning with a top level branch (103), 
each of said branch nodes including an array 
of adaptable objects (104, 105, 110, 112, 111, 
114-118) each associated with a subexpanse 
of said indexes mapped by a respective one of 
said branch nodes, said adaptable objects 
each including a type field (T) indicating a type 
of said adaptable object, said type including a 
pointer type in which said adaptable object is 
configured to include a pointer (1 1 6B) to anoth- 
er node and an information data field (116A) 
configured to store information about said other 
node, and an immediate type in which at least 
one of said indexes is stored in said adaptable 
object. 

2. A structure according to claim 1 , wherein said infor- 
mation data field (1 1 6A) represents an index portion 
common to subsidiary ones of said indexes such 
that a subsidiary node (123) is more than one level 
lower in the data structure than its parent (1 09) and 
does not encode a common portion of said subsid- 
iary indexes. 

3. A structure according to claim 1 or 2, wherein a 
pointer field (116B) associated with one of said 
adaptable objects (116) of said pointer type at one 
level of said data structure is directed to another of 
said branch nodes (1 23) at another level of said da- 
ta structure that is removed from said one level by 
at least two levels and said information data field of 
said adaptable object includes a portion of a plural- 
ity of said indexes common to all of said indexes in 
a subexpanse associated with said adaptable ob- 
ject. 

4. A structure according to claim 1 , 2 or 3, including a 
plurality of leaf nodes (701) associated with one or 
more of the indexes, wherein a pointer field associ- 



ated with one of said adaptable objects of said 
pointer type at one level of said data structure is di- 
rected to one of said leaf nodes at another level of 
said data structure that is removed from said one 
5 level by at least two levels and said information data 
field of said adaptable object includes a portion of 
a plurality of said indexes common to all of said in- 
dexes in a subexpanse residing in said one leaf 
node. 

10 

5. A structure according to any preceding claim, 
wherein at least a part of a aid adaptable object in- 
cludes an immediate index data field (702) config- 
ured to represent at least a portion of at least one 
*5 subsidiary index, such that said subsidiary index is 
immediately present without further indirection 
through a pointer to a different location in said com- 
puter memory. 

20 6. A method of storing indexes in a data structure, in- 
cluding the steps of: 

defining a data structure including a hierarchy 
of branch nodes (103, 108 : 109, 119, 120, 121, 

25 1 23) ordered into a plurality of levels beginning 

with a top level branch (103), each of said 
branch nodes including an array of adaptable 
objects(104, 105, 110, 112, 111, 114-118) each 
associated with a subexpanse of said indexes 

50 mapped by a respective one of said branch 

nodes, said adaptable objects each including a 
type field (T) indicating a type of said adaptable 
object, said type including a pointer type (11 6B) 
in which said adaptable object is configured to 

35 include a pointer to another node and an infor- 

mation data field (1 16A) configured to store in- 
formation about said other node, and an imme- 
diate type in which at least one of said indexes 
is stored in said adaptable object; and storing 

*o the indexes in the data structure. 

7. A method according to claim 6, including the step 
of defining said data structure to include a plurality 
of leaf nodes (701 ) associated with one or more of 

45 the indexes, wherein a pointer field (1 04) associat- 
ed with one of said adaptable objects of said pointer 
type at one level of said data structure is directed 
to one of said leaf nodes (701) at another level of 
said data structure that is removed from said one 

so level by at least two levels and said information data 
field of said adaptable object includes a portion of 
a plurality of said indexes common to all of said in- 
dexes in a subexpanse residing in said one leaf 
node. 

55 

8. A method as in claim 6 or 7, including the step of 
configuring at least a part of a said adaptable object 
to include an immediate index data field (702) rep- 
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resenting at least a portion of at least one subsidiary 
index, such that said subsidiary index is immediate- 
ly present without further indirection through a 
pointer to a different location in said computer mem- 
ory. 5 

9. A method as in claim 6, 7 or 8, including the steps 
of storing error detection data (703) in said adapta- 
ble objects and detecting an error condition within 
said data structure using said error detection data. 10 

10. A computer memory (803) for storing data for ac- 
cess by an application program executed on a data 
processing system, including: 

15 

a data structure stored in said memory for stor- 
age of indexes, said data structure including a 
hierarchy of branch nodes (103, 108, 109, 119, 
1 20, 1 21 , 1 23) ordered into a plurality of levels 
beginning with a top level branch (103), each 20 
of said branch nodes including an array of 
adaptable objects (104, 105, 110, 112, 111, 
114-118) each associated with a subexpanse 
of said indexes mapped by a respective one of 
said branch nodes, said adaptable objects 25 
each including a type field (T) indicating a type 
of said adaptable object, said type including a 
pointer type (1 1 6B) in which said adaptable ob- 
ject is configured to include a pointer to another 
node and an information data field (11 6A) con- 30 
figured to store information about said other 
node, and an immediate type in which at least 
one of said indexes is stored in said adaptable 
object. 

35 
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Abstract 



IP address lookup at the router is a complex 
problem. Tliis has been primarily due to the 
increasing table sizes, growth in traffic rate and 
high link capacities. In this work we have 
proposed an algorithm for fast routing lookup 
with reduced memory utilization and access time, 
litis approach shows significant performance 
improvement in the average case and optimizes 
the overall time taken for packet routing. Since 
storage requiremetu, processing time and number 
of lookups performed are reduced, power 
consumption by the router is also reduced. Our 
simulation result indicates that the proposed 
technique works approximately 4.11 times better 
than the standard LC Trie approach in the 
average case. 



1. Introduction 

Due to the technology advancement, the 
packet transmission rate is ever growing. With 
increasing number of hosts and sessions in the 
Internet, the processing of packets at the routers 
has become a major concern. While the current 
technology supports fast switching, packet 
forwarding is still a bottleneck. This is either due 
to a high processing cost or large memory 
requirement [9]. 

The routing tables implemented in most 
routers today generaUy follow the software-based 
approach, as they are more flexible and adaptive 
to any changes in the protocol. The hardware 
solutions are also not scalable and hence with the 
emergence of the IPv6, these approaches are 
nearly impractical. The software approaches 
expect to gain speed as the processing rate is 
doubled every 18 months (Moore's law). 

From early 90's to late 90's, the number of 
entries in the lookup table has changed from 
linear to super linear. In 2002, the backbone 
router contains approximately 100,000 prefixes 



and is constantly growing [9], A lookup engine 
deployed in the year 2002 should be able to 
support approximately 400,000-512,000 prefixes 
in order to be useful for atleast another 3 years. 
Thus the lookup algorithms must be able to 
accommodate future growth. 

Apart from the lookup entries the speed of the 
links are doubling every year. The links running 
on OC768c (approximately 40Gbps) require the 
router to process 125 million packets per second 
(Mpps) (assuming minimum sized 40 byte 
TCP/IP packets) [9]. For applications that do not 
require quality of service, a lookup or 
classification algorithm that performs well in the 
average case is desirable. This is so because the 
average lookup performance can be much higher 
than the worst-case performance. For such 
applications the algorithm needs to process 
packets at the rate of 14.1 Mpps for OC768clinks, 
assuming an average Internet packet size of 
approximately 354 bytes [9]. 

2. Research Objectives 

The main motivation behind our approach is as 
follows: 

1) Reducing the lookup time: Many of the trie- 
based approaches like [1] [2] take 6-7 memory 
references for lookup, while hash technique 
combined with Tries in [3] takes 5 memory 
references. Our goal is to modify the trie-based 
approach to reduce the number of memory 
references for each lookup. 

2) Reduce routing table storage: Implementing 
the routing table using trie [4] could be an 
expensive process. In this work our aim is to 
reduce the storage requirements by eliminating 
unnecessary and redundant data especially when 
the number of nodes are large. 

3) Ability to handle large routing tables: Most 
of algorithms support routing table sizes for the 
current needs. However an algorithm needs to be 
scalable so that it supports the routing table 
storage requirements for atleast the next 3 years. 
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3. Previous work on lookup algorithms 

Wc have classified the existing approaches into 
Trie and Non-Trie based approaches. We briefly 
analyze the different approaches with respect to 
lookup time and memory utilization. 

Table 1. Binary Strings to be stored in a Trie 
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3*1. Non-Trie based approaches 

Linear search is simplest data-structure with the 
linked list of all prefixes in the forwarding table. 
Storage and time complexity is O (N). With the 
emergence of IPv6 this approach is almost impractical 
to continue. Binary search algorithms include the 
classical methods like the hashing and tree based 
approaches. These techniques cannot distinguish 
matches based on prefix length. However [3] takes care 
of this constraint and proposes a modified binary search 
technique that uses log 2 (2N), where N is the number of 
routing table entries. These algorithms are also 
computation intensive and require a large storage 
especially as the lookup table grows. Caching is a 
better technique than other memory reference 
techniques, as it is faster [10] [6]. A fully associative 
memory, or content-addressable memory (CAM) [8] 
[9], can be used to perform an exact match search 
operation in hardware in a single clock cycle. CAM 
implementation is a costly mechanism and is not 
feasible. Thus the storage requirements for these 
become a limiting factor. Also CAM based approaches 
are for fixed length prefixes. A better solution is to use 
a ternary-CAM (TCAM), a more flexible type of CAM 
that enables comparisons of the input key with variable 
length elements. 

3.2. Trie based approaches 

Trie [4] is a general-purpose data structure for storing 
strings. Each string (prefix) in the routing table is 
represented by a leaf node in the trie. The longest 
prefix search operation on a given destination address 
starts from the root node of the trie [9], Patricia trie is 



a modification of a regular trie. The difference is that it 
has no one-degree nodes. Each branch is compressed to 
a single node in a Patricia tree. Thus, the traversal 
algorithm may not necessarily inspect all bits of the 
address consecutively, skipping bits that formed pan of 
the label of some previous trie branch. Patricia tree 
loses information while compressing chains: the 
bit-string represented by the other branches of the 
uncompressed chain is lost. This is overcome by a 
path compression technique that records by 
maintaining a skip value at each node that 
indicates how many bits have been skipped in the 
path [5]. The statistical property of this trie 
(Patricia trie) indicates that it gives an asymptotic 
reduction of the average depth for a large class 
distribution [11], However when the trie is 
densely distributed this approach fails in terms of 
storage and processing for lookup. Level 
compressed trie (LC-trie) [1] is a modification to 
this scheme for densely populated tries. An LC- 
trie is created by path compressing a binary trie 
and expanding every node rooted at a complete 
sub-trie of maximum depth to create a 2 k -degree 
node [9]. This expansion is done recursively on 
each subtrie of the trie. It replaces the i highest 
complete levels of the binary trie with a single 
node of degree 2\ All the internal nodes 
represented in this trie contain no information but 
pointers to the first child. Hence a separate vector 
is needed to store all possible prefix in a prefix 
vector in case of a failure in search. Also since 
each node is traversed again by backtracking in 
case of failure this is an inefficient method both in 
terms of processing time and storage. Thus all the 
variants of the trie-based approach are not storage 
efficient and require a lot of processing. 

4. Modified LC-Trie 

In this section we describe a scalable time 
efficient level compression technique. The 
approach presented here is motivated to minimize 
the access time and memory utilization during 
routing table lookup, to transfer packets to the 
appropriate Ethernet port. A few approaches like 
[1] have been proposed to better storage and time 
complexity. Our approach is based on the LC-trie 
approach for compressing the trie. We use the 
same compression technique as in LC-trie. 
However we try to avoid additional storage (LC- 
Trie uses additional data structures like base, 
prefix and nexthop vector to store prefixes) and 
processing (like backtracking), which are the 
major flaws of LC-Trie approach. Figure 1 
represents the tree corresponding to Table 1 for 
the proposed approach. From the Figure 1 we see 
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that unlike the LC-lrie approach our approach 
stores all the prefixes either in the internal nodes 
or leaf nodes. 

The algorithm first converts the routing table 
entries into a binary trie. Then path compression 
is done on this trie to reduce its depth. This path- 
corn pressed trie is now level compressed by 
storing the sub strings at the internal nodes and 
the strings at the leaves. When the routing table is 
built we use a FILL FACTOR (this represents the 
maximum number of branches that each node can 
have during the build) to help make the future 
updates easier. 
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Figure 1. Scalable Time Efficient Level Compression 
4.1 Storage data structure 

Each node of our trie is represented as in Figure 
2. Following is a brief description of the 
significance of each of the fields in the data 
structure. 



mi 


2 ' • 


3 


4 


5 


fr»4 






4bits7bils 5bils 16bils 
< 


32 bits 


► 

► 



64 bits 

Figure 2. Proposed Data structure for node in routing 
table (IPv4) 

Branching factor [0:3]: This indicates the number 
of descendents of a node. This is a 4-bit value and 
a maximum of 16 branches to a single node can 
be represented. 

Skip value [4:10]: This indicates the number of 
bits that can be skipped in an operation. This is a 
7-bit value and a maximum of 128-bit skip can be 
represented. 

Fon [11:15]: This represents the output port for 
the current node in case of a match. This is a 5 -bit 
value and this field can represent a maximum of 
32 output ports. 

Pointer [16:31]: This is a pointer to the leftmost 
child in case of an internal node and NULL in 



case of a leaf node. This is a 16 bit value and can 
represent a maximum of 65536 prefixes. The 
current implementation assumes the number of 
routing table entries to be less than 65536. 
However a scalable solution to consider more 
than 5,000,000 prefixes is discussed later. 
Siring [32:63 1: This represents the actual value of 
the prefix the node represents. The current 
implementation assumes a 32 -bit value (IPv4) 
though it can be extended to 128-bit value (IPv6). 

4.2 Algorithm 



node a T|01; s= tc&tdataRcJ; node « tab]c->tricf0]; 
pos GETSKIP(oode); branch » GETBRANC1 J(aode); 
adr^OETADR<node); 
prefix**)-, result--!;/* stared in Regisier */ 
whtlc< branch 0) /* Not leaf node */ 
{ node = table->trieladr + EXTR ACT(pos, branches)}; 
iftpos) 

rtrefix«pos; /* skip *pos* bits of the prefix */ 
if (hriinch > n) 

prefix- prefi x « <m+ 1 )| branch: 
/* n=2 m - l.mis the number of bits 
representing the branch */ j 
if(GfiTSTRING(nodc) A prcftx) • 

break: /* Previous node contains largest prefix and 
interface stored in result ii 

else 

{ pos = pos + branch + GETSKIP(node); 
branch = GETBRANCH(oode): 
adr = GBTADR(node); 
result = GETPORT< node): ) 



The search algorithm forms the bottleneck of the 
entire routing process and hence this needs to be 
designed very efficiently. Algorithm discussed 
above is used to search a string s in the routing 
table. EXTRACT (p,b,s) is used to search s in the 
routing table, where b is the number of bits 
starting at position p. Let the array representing 
the tree be T. The root is stored in T[0], 
Each entry in Table 2 represents a node in the 
proposed approach, for routing table described in 
Table 1 , with the corresponding branch, skip and 
pointer values. In addition to these three fields 
each node also has a 32-bit (IPv4) prefix 
represented by the node, which is not indicated in 
the table. 

4.3 Working of the algorithm 

The working of this algorithm is illustrated with 
an example. Consider the input string 10 1001. We 
start from root node number 0. We see that the 
branching factor is 3 and skip value is 0 and 
hence extract 1 st 3 bits from the search string. 
These 3 bits represent a value of 5, which is 
added to the pointer, leading to position 6 in the 
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array. At this node the branching value is 2 and 
the skip value is 0 and hence we extract the next 2 
bits. They have the value 0. However we check if 
the string (101 ) matches the prefix (101). Since it 
matches the search continues further. We now add 
the value of 0 to the pointer and arrive at position 
13. At this node the branching factor is 1 and 
skip value is 0. They have a value of 1. We again 
compute to see if the string (10100) is same the 
prefix (10100). Since it matches we continue and 
add the value of 1 to 19 to obtain the pointer 20. 
Now see that this node represents a leaf node 
since the branching factor is 0. We now check to 
see if the string (101001) matches the prefix 
(101011). Since they don't match we use the 
previous value of the output port from the register 
to route the packet. The Figure 3 represents the 
path taken during the search. 

Table 2. Array representation our approach 
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Figure 3. Trie traversal for the string 101001 

Thus we see that this approach doesn't traverse 
the entire trie in case the string is not present. 
Also there is no separate storage for the prefixes 
as in case of LC-Trie. This approach checks for 
the match in the string at every step. However this 
is not computation intensive since the string to 
compare is already present in the cache and a xor 
on the bits could give us the result of comparison. 



In the LC-trie approach after we traverse through 
the trie we perform a check for the string match in 
the base vector, which uses hashing technique, 
consuming at least on« memory fetch. If there is a 
mismatch a check is done again on the prefix 
table and this requires hashing to < heck for a 
prefix match, which again requires another 
memory fetch. Thus compared to the LC-trie we 
save atleast two memory cycles for every routing 
lookup performed. 

4.4. IPv6 Compatibility 

The algorithm can easily be extended to IPv6 and 
to allow a maximum of 2 37 entries. This ensures 
that the proposed data structure can support 
routing entries beyond 2005. This data structure 
also allows handling of a maximum of 1024 
interfaces. The data structure for each node is 
described in the Figure 4. 
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Figure 4. Proposed Data structure for node in 
routing table (IPv6) 

5. Simulations & Experimental Setup 

To test and verify our approach with the LC-Trie 
approach we have modified the test bed used by 
the autliors of [1], The features of this modified 
test bed are as follows: 

1. It reads routing data from the routing table file, 
which is in a predefined format as discussed in 
the paper [1]. The routing file is an exhaustive list 
of routing entries (65536 entries for 16-bit pointer 
value). 

2. The algorithm can be run over a number of 
times by specifying n as a command line 
argument. As the number of iterations increase, it 
gives a good estimate of the parameters under 
comparison. 

3. Quick sort algorithm is used to sort the routing 
table entries. 

4. We have also used two different approaches to 
compare the performance. One uses a function 
call to the search algorithm and the other is an 
inline function. However we have used the inline 
function results for our comparison. 

In our implementation we have used routing 
tables similar to that provided by the Internet 
Performance Measurement and Analysis project 
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obtained is art estimate of power for bne^terajlion. 
„The resuits_show_that proposed approach aricLthe 
LC-Trie approach consumes 4.61 5raW and 
4.755mW for 20 lookups respectively (Table 5). 
This is a reduction of ().14mW for 20 k&okups. 
This is directly related to the fact that the Biime for 
lookup is less. The routing entries fin our 
simulations are assumed to be stored in the 
DRAM and the storage power corresponding to 
that was computed for both the approaches. Since 
the storage requirements are reduced by factor of 
2.38, it is expected that power consumptions will 
be less by that amount. . 

Table 3. Timing 
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Table 4. Memory Utilization 
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Table 5. Power consumed 
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7. Conclusion & Future work 

We have proposed a scalable and efficient 
algorithm for fast lookup based on modified LC- 



J Tric Uechn^uer 'Thjs^fprithm^^ 
— four-time&4)ettep^ 
average case ' as compared to fo£ "EjC-Trie 
approach. ;\ m 

The proposed algorithm does approximately 6.6 
million -lookups per second on a 32 bit, 200Mhtz 
processor without considering caching 6f packets. 
Since the packet have certain amount of locality 
in ihem, caching cbuldlead to better perfomance. 
The search algorithm '-fonns. the bottleneck in the 
lookup process. Hence epmputatiem, kernels can 
be obtained and optimized, by implementirig at the 
hardware level. One sucU methbdfls *by* creating \ 
tie instructions (ih % Xtensa)^ ^r the lr set of ^ 



instructions that are executed more frequently, 
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rn>;^>o^i^ ; t^'c6nip>re. theij^pdified technique 
i witn LQ-TiT'c approach the traffic was simulated 
and we used a random permutation of all possible 
entries in the routing table. The time 
measurements have been.p^rforrned on sequences 
of lookup operations, , where cadi lookup includes 
fetching the address from • the array, performing 
the routing table lookup, accessing the nexthop 
table and,, assigning yhe result to a volatile 
variable, Some of the .entries in the routing tables 
contain multiple nexthops. In. such cases we select 
the first one .listed as the nexthoptaddress for the 
routing tobleysin,cejwe only, consider one nexthop 
address peij entry in the routing table. However 
for entties:that didn't contain .a nexthop address a 
special address; that i s different from the ones 
found in rowndf table was used. . .. 

'Bf^y Wi "f %?tions were'" used in the 
.coniputauon of aver% and standard deviation'of 
the samples (tj). \ h 
Average Time (avg) = $k . ' 
Std Deviation (std) = ( ti ? -nW'avg) 1 " /( n -l) 
Parameters analyzed* ' 
We have analyzed the\ effectiveness of our 
approach- and compare our "approach with me LC- 
Tne approach with respect %ming, storage and 
power consumption. The parWers considered 
in each of the cases are describe^elovv. " K J 

5.1. Timing ."?'. 

mi 1 S"^" 6 : Time taken 10 B^%ng table 
(B,): This is the time taken for the a&orithm to 

retrieve all the data from the Routing Sble file 
sort them and build them :wim.appropriate%ntries 
for future referencing. t 
Time taken to b|ild nexthop table (N t ): This is the 
time taken to compute all the next hop addresses 
from the routing table data. 

5.1.2 Sorting. Time taken y to Sort the entries (SO- 
Based on a seed value thetouting table entries are 
stored in a temporary data'Stnicttire in a random 
fashion, which is then sorted" for building the 
routing table, 

5.1.3 Searching. 

Function Search: Time taken to search the string 
(using call to a function) based on n iterations. 
Fmu,: Minimum time taken to search the string 
using function call. 

F,vg: Average time taken to.searchJhe string using 
function ca|l.v _j- 6 

F *< : Standard deviation of the times for searching 
a string using* function* call, 
Fip,: Average number of lookups/second using 
function call. 

Inline Search: Time taken to search the string 
(using an inline function) based on n iterations. 



•'W -Minimum, time taken to search the string 
using Inline function call. 
I avg : Average time taken to search the string using 
Inline function call. 

I«d : Standard deviation of the times for searching 
a string using Inline function call. & 
V Average number of lookups/second using 
inline function call. b 
5.2 Memory Utilization 

B m : Memory utilization in bytes for the base 
vector. 

P m : Memory utilization in bytes for prefix vector. 
N m : Memory utilization in bytes for nexthop 
vector. 

Trie CT„,): Memory utilization for. Trie. 

Following are the results for the comparison of 
|^-Tne approach, and proposed approach with a 
fi if.3i tor - p l a5 (*» »s a good value based on the 
experimental results considering future updates) 
and a.fixed branch at root (16). We have run this 
algonthm 100 times to get a good estimate of the 
values. This was run on an Intel Pentium II 
processor, 400Mhz and 256 MB RAM. -The 
programs were written in C and complied with 
gec compiler using optimization level -04. 
From Table 3 we observe that time taken to build 
ra einejs'reducedbyA14 seconds. This is mainly 
; . cue l ? % ? a 5t that no additional computation is 
required to build the base and prefix vector. Also 
there is no overhead of^building the nexthop table 
- he simulation results show that the proposed 
approach, works 3.28 times better than LC-Trie 
approach when the prefix search, is implemented 
as a funaion search and 4,11 times better when 
.implemented as an inline function. '"Ihusi for 
, above mentioned system configuration we were 
.gfe to achieve a lookup of approximately 6.6 
Mpps m the average case. From Table 4 we 
observe that the proposed approach- avoids the 
storage for base, prefix and nexthop vector and 
hence" occupies 2.38 times lesser storage. Though 
the reduction in storage for the nexthop vector is 
not significantly high, the storage for the base and 
prefix vector is greatly reduced. 
The processing power savings Tor the two 
approaches were compared using an 
implementation based on reconfigurable 
processor architecture from Tensilica (16/24 bit 
Xtensa ISA, 200Mhz. 0.18 urn technology 0 7 
mm core area. 0.8mW/MHz core power 
dissipation). The Routing table used in' our power 
analysis is described in Table 1. The result 
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